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We expand upon the recent semi-stochastic adaptation to full configuration interaction quantum Monte 
Carlo (FCIQMC). We present an alternate method for generating the deterministic space without a priori 
knowledge of the wave function and present stochastic efficiencies for a variety of both molecular and lattice 
systems. The algorithmic details of an efficient semi-stochastic implementation are presented, with particular 
consideration given to the effect that the adaptation has on parallel performance in FCIQMC. We further 
demonstrate the benefit for calculation of reduced density matrices in FCIQMC through replica sampling, 
where the semi-stochastic adaptation seems to have even larger efficiency gains. We then combine these ideas 
to produce explicitly correlated corrected FCIQMC energies for the Beryllium dimer, for which stochastic 
errors on the order of wavenumber accuracy are achievable. 


I. INTRODUCTION 

Projector quantum Monte Carlo (QMC) methods are 
important tools in calculating accurate properties of 
quantum systemsii^— Such methods involve stochastically 
applying a projection operator, P, such that the desired 
evolution is achieved on average. This leads to a stochas¬ 
tic and sparse sampling of the object under considera¬ 
tion, thus reducing the associated memory requirement 
and often allowing for the study of larger systems than 
possible with exact, deterministic approaches. While this 
approach is beneficial in granting access to such systems, 
the stochastic error decays slowly with simulation time; 
increasing the efficiency of the sampling therefore allows 
greater statistical accuracy to be obtained. 

A recent projector QMC method, full configuration 
interaction quantum Monte Carlo (FCIQMC)^^— , has 
been greatly successful in the highly-accurate study of 
many challenging systems, providing FCTquality results 
for systems well out of reach of traditional deterministic 
FCI approaches. While many traditional projector QMC 
methods such as diffusion Monte Carlo (DMC) sample 
the wave function in real space, FCIQMC performs the 
sampling in a space of discrete basis states. This discrete 
sampling of the wave function allows efficient annihilation 
to take place between the walkers, greatly ameliorating 
the sign problem in many situations^ and removing the 
need for a fixed node approximation. 

A recent article by Petruzielo et al^ provided a number 
of significant advances in FCIQMC, including the intro¬ 
duction of a semi-stochastic approach. In this approach 
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the basis states forming the FCI space are divided into 
two sets. The projection in the space of states in one 
set, whose states are deemed to be most significant, is 
performed exactly. The rest of the projection operator 
is applied stochastically as in the traditional FCIQMC 
algorithm. By performing projection in the most impor¬ 
tant region of the space exactly, the stochastic error on 
results can be significantly reduced. As the additional 
memory requirements need not be overwhelmingly large, 
the approach is still capable of treating systems far be¬ 
yond those accessible to exact diagonalization, and there¬ 
fore there were no significant drawbacks which offset this 
reduction in random error. 


In this article we further investigate and apply the 
semi-stochastic adaptation. In section |IT] we present a 
brief overview of the method and in section IIIII sug¬ 
gest a flexible and relatively black box method for par¬ 
titioning the FCI space. In section IIVI we explain how 
the semi-stochastic adaptation can be implemented in 
a straightforward and efficient manner in an existing 
FCIQMC code. Due to the importance of the efficient 
parallel scaling of FCIQMC we place particular empha¬ 
sis on this aspect. In section El results are presented. 
It is demonstrated that the semi-stochastic adaptation 
need not greatly alter the parallel performance in the 
current regime of applicability, and we present results for 
our method of partitioning the FCI space, both in the 
standard energy estimator and also in the calculation of 
reduced densities matrices within FCIQMC. Finally, the 
semi-stochastic approach is used to study the Beryllium 
dimer, with FI2 corrections calculated from reduced den¬ 
sity matrices. 
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II. THEORY 

The FCIQMC wave function is represented by a col¬ 
lection of walkers which have a weight and a sign and 
reside on a particular many-electron basis state, which if 
not specified can be assumed to be a single Slater deter¬ 
minant. The total signed weight of walkers on a state is 
interpreted as the amplitude of that many-electron basis 
state in the (unnormalized) FCI wave function expansion. 
The FCIQMC algorithm consists of repeated application 
of the projection operator 

P = 1- /\t{H - SI) (1) 

to some initial state, where H is the Hamiltonian opera¬ 
tor, At is some small time step and S is an energy offset 
(‘shift’) applied to the Hamiltonian to control the total 
walker population. With sufficiently small At, exact re¬ 
peated application of P will project the initial state to 
the ground state of Pi—. In FCIQMC, P is applied such 
that the correct projection is only performed on average, 
thus leading to a stochastic sampling of the ground state 
wave function. 

The projection operator can be expanded in the chosen 
FCI basis as 

p = (2) 

ij 

In the semi-stochastic adaptation the set of basis states 
is divided into two sets, D and S. We refer to the space 
spanned by those basis states in D as the deterministic 
space, and refer to the basis states themselves as deter¬ 
ministic states. The terms in Eq. ([2]) can then be divided 
into two separate operators, 

p = pD+pS^ ( 3 ) 

where P^ refers to the deterministic projection operator, 

E ( 4 ) 

i&DJeD 

and P^ is the stochastic projection operator contain¬ 
ing all other terms. In semi-stochastic FCIQMC, P^ 
is applied exactly by performing an exact matrix-vector 
multiplication, while P^ is applied using the stochastic 
FCIQMC spawning steps as usuaL- 

In order to perform an exact projection in the deter¬ 
ministic space, the walker weights must be allowed to be 
non-integers. This differs from most previous descrip¬ 
tions of the FCIQMC algorithm thus far. To be clear 
in notation and terminology, we use Ci to refer to the 
signed amplitude on a state, and Ni to refer to the un¬ 
signed amplitude (and so \Ci\ = Np, which we refer to 
as the weight on the state. 

A complete iteration of semi-stochastic FCIQMC is 
performed as follows, where T = — {Pi — 51): 


1. stochastic projection: Loop over all occupied 
states. Perform Xi spawning attempts from state 
|i), where Xi is specified below. For each spawn¬ 
ing attempt, choose a random connected state \j) 
with probability pij, where connected means that 
Hij = {i\ll\j) p 0 and i p j. The attempt fails 
if both |i) and \j) belong to D, otherwise a new 
walker on state \ j) is created with weight and sign 
given by TjiCiAr/pij. 

2. deterministic projection: New walkers are cre¬ 
ated on states in D with weights and signs equal to 
AtT^C^, where is the vector of amplitudes 
currently on states in D. 

3. death/cloning: Loop over all occupied states in 
S. For each state create a spawned walker with 
weight and sign given by TaCiAr. 

4. annihilation: Combine all newly spawned walkers 
with walkers previously in the simulation by sum¬ 
ming together the amplitudes of all walkers on the 
same state. 

Xi is chosen probabilistically such that its expected value 
obeys E[xi] = Ni. Although other approaches have been 
used^, in this work we set 

Xi = with probability Ni - [A^J, (5) 

= [Ni\ otherwise, (6) 

where |"Ai] denotes rounding up and [A^J denotes round¬ 
ing down. If integer weights are used then this reduces 
to Xi = Ni, as used in previous worb^. 

In order to reduce the memory demands of having a 
large number of states occupied with a low weight, a 
minimum occupation threshold, Nqcc, is defined. After 
all annihilation has occurred, any walkers with a weight 
less than Nocc are rounded up to Nqcc with probability 
Ni/Nocc or otherwise down to 0. In practice, we always 
choose Nocc = I- The occupation threshold is not applied 
to deterministic states so that the deterministic projec¬ 
tion is applied exactly. 

We further use a modification to the initiator adapta¬ 
tion to FCIQMCSi^ by allowing all successful spawning 
events both from and to the deterministic space to sur¬ 
vive. This effectively forces all deterministic states to be 
initiators, which is sensible since these states should se¬ 
lected by their importance (i.e. weight). In the scheme 
used by Petruzielo et al., the initiator threshold was al¬ 
lowed to vary based on the number of steps since a walker 
last visited the deterministic space, and so our approach 
is different (although deterministic states are always ini¬ 
tiators in both schemes). In Supplemental Materiali^ 
we show that our scheme achieves the same qualita¬ 
tive behavior for the Hubbard model as demonstrated 
in Ref. (@). We have not performed a comprehensive 
study of the effect of semi-stochastic on the initiator er¬ 
ror. However, we tend to find that when the number 
of walkers, N^, is much larger than the deterministic 
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space size, the use of semi-stochastic makes little differ¬ 
ence. This is expected because the two approximations 
should be identical in the limit N^/\D\ ^ 1. We also 
note that it is not essential to use both the initiator and 
semi-stochastic adaptations together; the benefits from 
both extensions are largely independent of each other. 
However, all results presented in this article do use the 
initiator adaptation. 

Using non-integers weights can have a significant mem¬ 
ory impact compared to integer weights due to the large 
number of additional spawned walkers, which also in¬ 
creases time demands due to expensive extra processing 
and communication steps. We therefore apply an unbi¬ 
ased procedure to stochastically remove newly-spawned 
walkers with very small weights, similar to that above. 
Following the notation of Overy et aZ.— , we use a spawn¬ 
ing cutoff, K, where k = 0.01 unless stated otherwise. A 
spawning of weight Nj < k is rounded up to k with prob¬ 
ability Nj/k or otherwise down to 0. Spawned walkers 
with weights greater than k are left unaltered. 


III. CHOOSING THE DETERMINISTIC SPACE 

The key to reducing stochastic error within the semi¬ 
stochastic approach is to choose D such that most of the 
weight of the true FCI wave function is in this space. For 
a given number of basis states in the deterministic space, 

11? I, it is expected that the best possible deterministic 
space (the one which reduces noise the most) is obtained 
by choosing the \D\ most highly weighted basis states in 
the exact expansion of the ground-state wave function. 
Achieving this optimal space requires knowledge of the 
exact wave function and so is not feasible in general. 

A sensible choice for D in many systems would be a 
truncation of the FCI space by number of excitation oper¬ 
ators applied to an initial dominant configuration (gener¬ 
ally the Hartree-Fock state), giving the truncated ‘CT ex¬ 
pansion, or alternatively a complete active space (CAS) 
truncation. These are generally regarded as being effec¬ 
tive at describing situations where dynamical and static 
correlation, respectively, are important. We have found 
from experience that such spaces are useful and lead to 
a large reduction in stochastic noise. This leads to the 
question: can one find a better deterministic space, at 
least in common cases? 

Petruzielo et al^ describe an iterative method for 
choosing the deterministic space. First the space con¬ 
nected to the states chosen in the previous iteration is 
generated and the ground state of the Hamiltonian in 
this subspace is calculated. The most significant basis 
states in the ground-state expansion are kept (according 
to a criterion on the amplitude of coefficients). The ini¬ 
tial space contains (e.g.) the Hartree-Fock determinant. 
This process is repeated for some number of iterations. 
This approach was shown to give much greater improve¬ 
ments than by simply using the space connected to the 
Hartree-Fock state, even with a reduced size for D, as it 


can contain the chemically-relevant basis states^. 

In this work we present and use a new method of gen¬ 
erating the deterministic space. Inspired by the spirit 
of FCIQMC, we allow the deterministic space to emerge 
from the calculation itself: we simply perform a fully- 
stochastic FCIQMC calculation (or a semi-stochastic cal¬ 
culation with a simple deterministic space, such as a 
CISD space) until a coarse representation of the ground 
state is deemed to have been reached, and then choose 
the most populated basis states in the FCIQMC wave 
function to form D. Because the semi-stochastic adapta¬ 
tion does not significantly change the rate of convergence, 
and statistics are not accumulated until the ground state 
is reached anyway (where the semi-stochastic adaptation 
is of more benefit), this requires no extra computational 
effort. This approach has the benefit that it does not re¬ 
quire performing an exact ground-state diagonalization 
within a (potentially large) subspace, which can be very 
expensive. The only parameter is the desired determin¬ 
istic space size and it is therefore also a relatively black 
box approach. 

Although the FCIQMC wave function is only a 
stochastic snapshot of the true ground state, the most sig¬ 
nificant basis states in the expansion will tend to remain 
highly occupied throughout the simulation with weights 
fluctuating about their exact values. The FCIQMC simu¬ 
lation naturally picks out chemically-important determi¬ 
nants, even when deep in the Hilbert space (on quadru¬ 
ple, sextuple and higher excitation levels), and so our pro¬ 
cedure can select close-to-optimal deterministic spaces in 
a very inexpensive manner. For very large deterministic 
spaces, states with the smallest occupation weights may 
be included in the space. In this case there is some re¬ 
dundancy in how D is chosen, and the choice of D will 
probably not be optimized fully, although we still find 
this approach to work very well. It is simple to include 
a cutoff to avoid this if desired, although we do not do 
so in the calculations presented here. With our approach 
we avoid the need for diagonalization steps, which would 
become unfeasible for large D and as the connectivity of 
the Hamiltonian grows. For instance, in some previous 
applications of FCIQMC the number of connections to 
the Hartree-Fock has been (!I[I0® — 


IV. IMPLEMENTATION DETAILS 

An in-depth description of our FCIQMC implementa¬ 
tion is given in Ref. (|I4ll : we present here only the addi¬ 
tions to that algorithm required for the semi-stochastic 
implementation. Our implementation of FCIQMC is par¬ 
allelized using MPI. A given basis state is assigned to 
a particular MPI process, which performs all spawning 
from that basis state. Deterministic and stochastic states 
are treated equally in this respect. 

Iterative diagonalization methods, such as the David¬ 
son or Lanczos methods, typically require at most a few 
tens of iterations. Given the desire to treat as large a 
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system as possible and the memory cost of storing even 
a compressed form of H, many deterministic subspace 
methods use the direct Cl approach of constructing Hv 
as neededi^. In contrast, FCIQMC calculations regularly 
require on the order of 10^ — 10® iterations. Thus, it is of 
critical importance that each multiplication by the (com¬ 
paratively small) deterministic Hamiltonian is very fast. 
Storing the Hamiltonian, which speeds up this multipli¬ 
cation considerably, is therefore worthwhile and feasible. 
Because FCIQMC scales well to large numbers of proces¬ 
sors, a large amount of distributed memory is typically 
available and one is usually far more time-limited than 
memory-limited. As such, we have not found this mem¬ 
ory requirement to become an issue. 


The deterministic Hamiltonian is stored in a sparse 
matrix format and split across processes so that, if |i) 
belongs to an MPI process, then all non-zero elements 
{i\H\j), \j) G D (i.e. the entire compressed row) are 
also stored in memory on that process. In contrast the 
walker amplitude for basis state |*) is only stored in mem¬ 
ory for the process to which |z) belongs. We gather 
the amplitudes of the deterministic basis states via an 
MPI_AllGatherV call and perform the deterministic pro¬ 
jection via a sparse matrix multiplication on each MPI 
process. Our implementation therefore requires one ad¬ 
ditional parallel communication per iteration compared 
to the standard FCIQMC algorithm. 


For the most part, deterministic states are treated 
in the same manner as non-deterministic states. Be¬ 
cause the stochastic and deterministic spaces are cou¬ 
pled, stochastic spawning attempts must still be per¬ 
formed from D. However, because the deterministic- 
to-deterministic projection is performed exactly, such 
stochastic attempts should not create new walkers in¬ 
side D. For simple deterministic spaces, such as Cl and 
CAS spaces, it is possible to create excitation generators 
which never create deterministic-to-deterministic spawn¬ 
ings. This is not feasible for more general spaces, such as 
in the schemes outlined in the previous section. Instead, 
we remove any walkers stochastically spawned from a 
state in D to another state in D. This check is effi¬ 
ciently performed by using a hash table (similar to that 
used for annihilation in FCIQMCi^) of the deterministic 
space, such that the test of whether the basis state is 
in D can be performed in 0[1] time. The extra memory 
required to store the hash table is usually negligible com¬ 
pared to other memory requirements, such as that of the 
deterministic Hamiltonian. It is also partially compen¬ 
sated by the fact that a smaller number of (stochastic) 
spawned walkers are accepted, and so memory demands 
of the spawned list are decreased. 


V. RESULTS 
A. Parallel performance 

Figure [T] presents the parallel speed-up for the 
chromium dimer (bond length I.bA, SV basis, CAS 
(24,30)) from 24 to 1152 cores on ARCHER, a Cray 
XC30. This system has a Hilbert space size of ^ 
and approximately 2 x 10® walkers were used in each 
simulation (sufficient to converge the initiator error to 
high accuracy). We consider FCIQMC calculations us¬ 
ing both integer weights and non-integer weights, and 
semi-stochastic FCIQMC using D = 100 and D — 10®. 
It is apparent that the scaling quality reduces somewhat 
by using non-integer coefficients. However, there is al¬ 
most no further decrease in quality when using the semi¬ 
stochastic adaptation. In fact, the scaling is slightly 
improved when using a deterministic space size of 10®. 
The semi-stochastic initialization times, usually negligi¬ 
ble compared to the total calculation time, were not in¬ 
cluded in these results so that the scaling curves did not 
depend on the number of iterations performed. 

By far the largest cause of loss in parallel efficiency in 
FCIQMC is poor load balancing. The number of ba¬ 
sis states and walkers assigned to each process is not 
precisely constant. Each process therefore takes a dif¬ 
ferent amount of time to complete each iteration. The 
slowest process acts as a bottleneck for other processes 
with less work, which must synchronize before parallel 
communication can be performed. It is found that using 
non-integer coefficients somewhat exacerbates this issue, 
leading to the loss of parallel efficiency seen in figure [T] 
This worsening of load balancing with non-integer coef¬ 
ficients is primarily due to the greatly increased num¬ 
ber of spawning events that are received, and must be 
processed, by processes with already-heavy loads. As 
expected, communication time is also increased by us¬ 
ing non-integer weights due to the extra spawning events 
that must be sent and received, but this time still remains 
quite small compared to the synchronization time. 

Figured] shows similar results for the 18-site 2D Hub¬ 
bard model, with solid (dashed) lines representing U/t = 
1 {U/t = 8). All calculations used approximately 5 x 10^ 
walkers. The load balancing for U/t = 8, which has a 
highly delocalized wave function in the Bloch basis, is 
very good. Excellent (essentially identical) parallel scal¬ 
ing is seen in all cases. The U/t = 1 system, however, is 
heavily dominated by the Hartree-Fock state. Therefore, 
the process to which this state belongs takes much longer 
to complete each iteration than other processes. This 
problem is exacerbated as the process count increases, 
and the parallel performance is very poor. It is found that 
using the semi-stochastic adaptation slightly improves 
this performance, as the Hartree-Fock process has far 
fewer successful spawning events to perform calculations 
on, due to many deterministic-to-deterministic stochastic 
spawning events being canceled. Despite this, the most 
expensive steps are the spawning attempts, which are 
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FIG. 1. Speed-up as the number of the MPI processes is 
increased from 24 to 1152, for calculations performed on the 
chromium dimer in an SV basis, with approximately 2 x 10® 
walkers used. Scaling worsens with the use of non-integer 
weights, primarily due to exaggerating the poor balancing of 
work among processes. The semi-stochastic adaptation does 
not greatly alter the scaling further. 


still performed, and so the load balancing is still poor. 

However, if all states connected to the Hartree-Fock 
through a single application of H are included in the 
deterministic space, then all stochastic spawnings from 
the Hartree-Fock state will be canceled. As such there 
is no need to attempt any spawning from the Hartree- 
Fock state. When this change is made, the parallel per¬ 
formance improves dramatically. This suggests that us¬ 
ing deterministic approaches to treat the most heavily- 
weighted states may be a very effective way to improve 
the parallel performance of FCIQMC. This approach has 
not been used for any further results in this article, but 
will be an area of research going forward. 


B. Hartree-Fock energy estimator 


We first present efficiency increases for the standard 
Hartree-Fock-based projected energy estimator: 


(UhfI'I') 


(7) 


where |Z1hf) is the Hartree-Fock state and Ilk) is wave 
function represented by the FCIQMC walkers. The effi¬ 
ciency measure that we consider is the same one used by 
Petruzielo et al^, 


e 


1 


( 8 ) 


where T is the total simulation time (excluding initial¬ 
ization timei^) and is the final energy error esti¬ 
mate, obtained by averaging multiple estimates taken 
from throughout the simulation. This is a sensible ef¬ 
ficiency measure because the error of such an average 


FIG. 2. Speed-up as the number of the MPI processes is 
increased from 24 to 96, for calculations performed on the 18- 
site Hubbard model with approximately 5 x 10^ walkers. Solid 
lines show results a.t Ujt — 1 and dashed lines show U/t = 8. 
At t//t = 1 scaling is very poor due the extremely large num¬ 
ber of walkers on the Hartree-Fock state. The scaling is im¬ 
proved slightly with the use of semi-stochastic, as far fewer 
spawnings from the Hartree-Fock state survive. If all connec¬ 
tions to the Hartree-Fock are included in the deterministic 
space then stochastic spawning from the Hartree-Fock does 
not need to be performed. With this change the load bal¬ 
ancing is greatly improved. At U/t = 8 the scaling is much 
better, and little difference is made by the semi-stochastic 
adaptation. 


can be estimated via 




cr 

71 ’ 


(9) 


where N is the number of uncorrelated estimates con¬ 
tributing to a mean estimate (possibly taken from a 
blocking analysisii), and a is the true standard devia¬ 
tion of the distribution from which each estimate is taken, 
which is constant at equilibrium. N scales linearly with 
T, and therefore this efficiency measure is appropriate. 

Caution should be taken in interpreting the relative 
efficiency of two simulations, however. An increase of ef¬ 
ficiency of X with semi-stochastic does not necessarily 
mean that a particular value of cr^ can always be ob¬ 
tained with X times less simulation time by using the 
semi-stochastic adaptation. For the efficiency results pre¬ 
sented here, all error estimates are calculated through a 
blocking analysisii in order to take account of the seri¬ 
ally correlated nature of the FCIQMC wave function be¬ 
tween subsequent iterations. Such an analysis typically 
requires a large number of iterations (depending on the 
system and time step used) in order to obtain an accu¬ 
rate error estimate. Because using semi-stochastic does 
not seem to reduce the auto-correlation time, a similar 
number of iterations must be performed. Thus after a 
sufficient number of iterations, the error obtained even 
without the use of the semi-stochastic adaptation may be 
suitably small. In such cases, the overhead of performing 
the deterministic projection is not particularly advanta¬ 
geous. In most cases, however, more than a small number 
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FIG. 3. The efficiency {^Eq) of semi-stochastic simula¬ 
tions relative to otherwise identical simulations without semi¬ 
stochastic, for an 18-site Hubbard model. For small U/t val¬ 
ues, where the wave function is dominated by a small number 
of basis states, the semi-stochastic adaptation helps greatly, 
leading to an efficiency increase of over 4.5 x 10"^ at t//t=0.25 
and |II| = 10®. For larger U/t values the benefit becomes less 
significant. 

of independent blocks of data are required, and so using 
the semi-stochastic approach is highly beneficial. 

We present results for a variety of systems to demon¬ 
strate the general applicability of semi-stochastic and our 
method for generating the deterministic space. All cal¬ 
culations were performed with 10® walkers unless stated 
otherwise, and the deterministic space was generated us¬ 
ing the new scheme outlined in section III. The relative 
efficiencies presented are relative to otherwise identical 
simulations that do not use the semi-stochastic adapta¬ 
tion (but do use non-integer coefficients, with the same 
value of spawning cutoff, k). All calculations for each 
system were performed on the same machine and num¬ 
ber of CPU cores (between 24 and 96). Simulations were 
typically performed for between 10® and 10® iterations. 

In each case the time step was set just small enough to 
avoid the possibility of bloom events, where more walkers 
than the initiator threshold are created from one spawn¬ 
ing attempt. Such events are undesirable because they 
instantly become initiators, increasing the associated ini¬ 
tiator error as a consequence. 

Figure [3] shows the efficiency of semi-stochastic 
FCIQMC for the 18-site 2D Hubbard model at a variety 
of coupling strengths, from U/t = 0.25 to U/t = 4, and 
for deterministic space sizes ranging from 10^ to 10®. Sig¬ 
nificant increases in efficiency are observed in all cases, 
with the most significant gains occurring at small Ujt. 
This is expected: at small coupling strengths the wave 
function is dominated by a small number of significant de¬ 
terminants which are treated exactly by the deterministic 
space, whereas at large U/t the equivalent deterministic 
space will be significantly less occupied. 

Table U contains results for two molecular systems, 
N 2 in a cc-pVDZ basis (with 4 core electrons uncorre¬ 
lated) and Be 2 in a cc-pCVTZ basis, at equilibrium and 


\D\ 

N 2 



Be2 


Equilibrium 

Stretched 

2.45A 

3.0A 

5.0A 

10^ 

6.0 

30.6 

2.5 

3.9 

2.6 

10® 

45.3 

127.0 

8.6 

13.4 

11.3 


283.4 

2793.5 

50.4 

90.4 

103.1 

10® 

1550.4 

4710.4 

218.3 

765.4 

560.5 


TABLE I. The efficiency (ebo) of semi-stochastic simulations 
relative to an otherwise identical simulation without semi¬ 
stochastic. Results are shown for N 2 in a cc-pVDZ basis with 
4 core electrons uncorrelated, at equilibrium (2.118ao) and 
stretched (10.4ao) geometries, and also for Be 2 in a cc-pCVTZ 
basis at equilibrium and two stretched geometries. A signif¬ 
icant increase in efficiency is found for all geometries, and a 
monotonic improvement with |D| is always observed. 


stretched geometries. Once again significant improve¬ 
ments are observed, even at stretched geometries where 
the wave function is more multi-reference. 

One might expect that as the number of walkers in¬ 
creases, the improvement gained from semi-stochastic 
would decrease. This seems reasonable because, as the 
walker number is increased, the stochastic spawning will 
approach exact projection. Interestingly, we find the op¬ 
posite behavior. In figure |4] the relative efficiency is pre¬ 
sented for the homogeneous electron ga a^^d®d® with 14 
electrons, 114 plane wave spin orbitals and rg = 1.0 a.u., 
as the walker population is varied from 10® to 10®. For 
deterministic space sizes of 10^ and 10® a significant in¬ 
crease in stochastic efficiency is observed as the walker 
population is increased. For |D| = 10® the increase is less 
significant, becoming approximately constant for popu- 



FIG. 4. The efficiency (eso) of semi-stochastic simulations 
relative to an otherwise identical simulation without semi¬ 
stochastic, for the 14-electron homogeneous electron gas with 
114 spin orbitals and rs = 1.0 a.u., as the walker population 
is varied. It is found that the benefit of semi-stochastic tends 
to increase as the walker population increases, contrary to a 
simplistic intuition that there should be diminishing returns 
as stochastic error decreases due to the improved stochastic 
sampling. 
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lations from 10® to 10®. These results suggest that the 
semi-stochastic adaptation will continue to be beneficial 
for very large systems and walker populations. It is hard 
to isolate precisely why these benefits increase, but vari¬ 
ous factors such as the sign problem and subtleties in the 
impact of the initiator criterion are likely to play a role. 
These are also expected to be very system-dependent. 


C. Reduced density matrix estimators 

A recent article by Overy et al^ introduced a method 
of unbiased sampling for reduced density matrices in 
FCIQMC. In this approach a replica samplin g^^i^®~ — is 
used, where two independent FCIQMC simulations are 
performed simultaneously, each starting from a different 
random number seed. Because these two simulations are 
statistically independent, it is possible to sample quanti¬ 
ties that depend quadratically on the ground-state wave 
function without introducing a bias. In particular, the 
components of the second-order reduced density matrix 
can be expressed as 

^pq,rs — I ^) , (^0) 

= ^CiCj{D,\alalasar\Dj). ( 11 ) 

This can be estimated in FCIQMC via 

Tpq^rs = ClCf{D,\alalasar\D,). ( 12 ) 

ij 

where and are the walker amplitudes coming from 
simulations 1 and 2 , respectively^^, and p, q, r and s 
denote spin orbital labels. 

In the implementation used for the results in this 
article (NECI— ), diagonal elements (Tpg^pq) are calcu¬ 
lated exactly for the pair of FCIQMC wave functions 
used. For non-diagonal elements of Tpg^rs, contri¬ 
butions in Eq. (HU including the Hartree-Fock state 
{C^pCj{Diip\a^a}jasar\Dj)) are always included in the 
estimate, while all other contributions are stochastically 
sampled alongside the stochastic samp ling of the Hamil¬ 
tonian operator, as described in Ref. (till ). Thus, there 
are two sources of error contributing to each estimate 
of the 2-RDM: the random sampling of the ground-state 
wave function, and also the random sampling of the 2 - 
RDM given these wave functions. 

In the semi-stochastic adaptation we modify the esti¬ 
mation of the 2-RDM by also always including all contri¬ 
butions between states in the deterministic space. That 
is, the contribution 

ClC]{D,\alalasar\D,) (13) 

is always included if both \Di) and \Dj) belong to D. 
This is achieved by storing a further array, roughly the 
same size as the deterministic Hamiltonian, which stores 



FIG. 5. The efficiency (£^ 52 ^) of semi-stochastic simulations 
relative to an otherwise identical simulation without semi¬ 
stochastic, for an 18-site Hubbard model. This efficiency mea¬ 
sure uses the estimate of (S^) obtained from stochastically- 
sampled RDMs. The stochastic efficiency is seen to improve 
in a manner similar to esg, although the improvement is even 
greater. 


the excitation levels and coupling parities of pairs of con¬ 
nected states in the deterministic space. This then speeds 
up the calculation of these contributions, although these 
quantities could easily be calculated on-the-fly to save 
memory. Because both \Di) and \Dj) belong to D, C] 
and C'j should both be large and so semi-stochastic nat¬ 
urally picks out the large contributions to the RDM. Be¬ 
cause each process stores all deterministic amplitudes (as 
a result of the MPI_AllGatherV call during the determin¬ 
istic projection step) no extra communication is required. 
We therefore emphasize that the semi-stochastic adapta¬ 
tion improves the RDM estimates in two ways, firstly by 
improving the underlying FCIQMC wave functions and 
secondly by improving the sampling of the 2-RDM given 
these wave functions. We note that in our implemen¬ 
tation the 1-RDM contribution is calculated from the 
2-RDM, and therefore also benefits from the improved 
calculation of off-diagonal elements of Tpg^rs ■ 

Two separate quantities are considered to study the 
quality of these RDM estimates. The first is the varia¬ 
tional energy estimate 

Ardm = (14) 

— ^ ^ ^pq^pq T ^ ^ ^pq,rs{pq\\'^^} 4” ^nucj 

pq p>q,r>s 

(15) 

where 'jpg = ('I'|a|,aq|'I') is the 1-RDM. This should be 
an important quantity in FCIQMC because it is varia¬ 
tional, whereas the energy obtained in i-FCIQMC from 
equation © is not. The second quantity considered is 
the expectation value of S'^, which for spin- 1/2 particles 
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N2 
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N2 
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\D\ 

Equilibrium 

2.45A 

3.0A 

5.0A 

Equilibrium 2.45 A 

3.0A 

5.0A 

10^ 

1.82 

3.57 

2.03 

2.55 

4.23 

9.20 

18.57 

6.89 

10^ 

46.12 

6.69 

11.39 

12.68 

28.49 

411.56 

524.27 

89.19 


434.98 

109.58 

35.67 

43.23 

172.95 

2363.88 

8299.70 

933.13 

10® 

855.23 

231.12 

234.48 

1033.02 

1370.06 

5877.94 

11337.21 

1627.39 


TABLE II. Relative efficiencies in the RDM estimates of energy and (S^) for N 2 (in a cc-pVDZ basis with 4 electrons uncor¬ 
related, at a separation of 2.118ao) and Be 2 in a cc-pCVTZ basis (with all electrons correlated). Large improvements are seen 
for all systems, geometries and estimators, with a monotonic increase with \D\ in each case. 


can be calculated as 

IJ 

1 3 

— -j^Iajpjajp — ^ Iajp,Jaip\ + , (16) 

where N is the number of particles and / and J are spa¬ 
tial orbital labels. 

In figure [S] the relative efficiency is shown for the same 
Hubbard systems used in figure [31 but using the estimates 
of {S'^) from the 2-RDM. Results from these two figures 
used the same parameters but were taken from different 
simulations. RDM estimates were calculated every 200 
iterations and a blocking analysis was performed. Once 
again, substantial improvements are found, with an effi¬ 
ciency increase of over 10® observed for U/t = 0.25. Some 
remarkably accurate results are obtained. It is known 
that the exact value of ('I'|5^|'I') should be 0 for the 
ground state of this system. For U/t = 0.25, estimates 
of (T'j.S^I'I') change from 3.1 x 10“^ ± 2.5 x 10“^ for the 
fully stochastic formulation, to —1.3 x 10“^®± 1.4 x 10“^® 
for \D\ = 10®. 

Table m shows relative efficiencies for molecular sys¬ 
tems, for both the variational energy and spin estima¬ 
tors. Once again, significant improvements are seen in 
all cases, and there is always an improvement with in¬ 
creasing \D\ for the range of deterministic space sizes 
considered here. This suggests using a large determin¬ 
istic space is sensible, although this has to be weighed 
against increasing memory requirements. 

D. Be 2 F12 results 

As a further demonstration of the benefits of semi¬ 
stochastic, we consider the calculation of an explicitly 
correlated correction to the basis set incompleteness error 
in Be 2 . The explicitly correlated ‘F12’ approach was first 
proposed in 1985 ^®i^® , and has since been refined22r— to 
become a standard tool to accelerate basis set conver¬ 
gence in quantum chemistr y®^'®® . The aim of this ap¬ 
proach is to complement the traditional wave function 
expansion (in Slater determinants) by a small set of func¬ 


tions which have an explicit dependence on the inter- 
electronic distance. These ‘geminal’ functions are crucial 
for an accurate description of the electronic cusps. 

In this work, rather than optimizing the sampled 
wave function in the presence of the explicitly correlated 
geminal functions, they are instead coupled after the 
FCIQMC calculation via an internally contracted mul¬ 
tireference perturbative approach ([2]i{i2). This method 
was first proposed by Torheyden et and first ap¬ 

plied to FCIQMC in Ref. (fSo) . This approach allows the 
calculation of the correction through the sampled one and 
two-body density matrices (after some rank-reducing ap¬ 
proximations) . The quality of these corrections will pro¬ 
vide a further demonstration of the accuracy of FCIQMC 
reduced density matrices when using the semi-stochastic 
approach. It should be noted that in previous applica¬ 
tions of this method to FCIQMC, the results were with¬ 
out the semi-stochastic adaptation and approximated the 
RDMs without the use of a replica sampling (in a bi¬ 
ased fashion)^. An alternative explicitly correlated ap¬ 
proach, where the Hamiltonian is ‘transcorrelated’ via an 
approximate many-body canonical transformatioii^ has 
also been used within the FCIQMC framework)^ While 
it is still unclear which is the optimal strategy within 
FCIQMC, in this work we focus on the a posteriori ap¬ 
proach, in order to demonstrate the accuracy of the sam¬ 
pled RDMs with the semi-stochastic adaptation. 

Be 2 is a very weakly bound molecule which has resulted 
in extensive study within the literature, both in theo¬ 
retical and experimental investigations^^^— . The small 
binding energy is a stern test for FCIQMC, as it re¬ 
quires careful control over stochastic errors on the order 
of /ri?h for reliable results. These random errors limited 
the accuracy in the previous FCIQMC study of Ref. ((H) , 
and as such we expect the improvement due to the semi¬ 
stochastic approach to be important. 

An all-electron semi-stochastic FCIQMC calculation 
was performed within the cc-pCVDZ-F12 basis set to 
calculate 2-RDM estimates. From these 2-RDM esti¬ 
mates, [2 ]j^i2 corrections were calculated via the MPQC 
code^. The binding in Be 2 is primarily due to dispersion 
interactions, and requires very high angular momentum 
atomic orbitals for an accurate description. As such, it 
is not to be expected that the cc-pCVDZ-F12 basis set 
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(which only contains up to d angular momentum orbitals) 
will provide high quality results. Rather, this system is 
used to demonstrate the very great accuracy with which 
[ 2 ]fl:i 2 corrections can be calculated when using the semi¬ 
stochastic approach. 

The FCIQMC calculations used time-reversal sym¬ 
metrized functionsi^i^i^ as basis states, as a compro¬ 
mise between Slater determinants and full configuration 
state functions. The total space size was ^ 4 x 10^° ba¬ 
sis states. The deterministic space consisted of 3 x 10"'’ 
states, chosen using our scheme presented in section UlIl 5 
repeats were performed for each geometry (starting from 
different random number generator seeds) to provide (un¬ 
correlated) estimates with which to estimate error bars 
on the [ 2 ]/ii 2 corrections, and also on FIrdm estimates. 
Approximately 6 x 10® walkers were used for each cal¬ 
culation. This was found to reduce initiator error to no 
greater than roughly lOfiEh at each bond length, with 
such accuracy being deemed necessary for this weakly- 
bound system. An optimal value of the single param¬ 
eter in the F12 geminal (the 7 exponent) was found by 
minimizing the [ 2 ]ri 2 correction at equilibrium geometry, 
yielding an optimized 7 = 2.44aQ 

In figured the [ 2 ]ri 2 contributions are shown. These 
values are plotted relative to the correction at infinite 
separation. This infinite separation value was calculated 
by using the exact FCI 2-RDM for the Be atom, and 
therefore has no stochastic or systematic error. It is 
therefore very encouraging that our FCIQMC results con¬ 
verge to this value so accurately. Error bars are not vis¬ 
ible on the plot, all being less than 0 . 12 cm“'^ {0.5fj,Eh). 
This demonstrates that the semi-stochastic adaptation 
works well for these [ 2 ]ri 2 corrections, as for the RDM- 
based quantities already considered. 

In figure[7]the variational energy calculated from the 2- 
RDM estimates (using Eq. (fTSl) ') is shown, together with 
the total energy, including both Erdm and [ 2 ]ri 2 con¬ 
tributions. It is seen that the [ 2 ]ri 2 corrections greatly 
reduce the basis set incompleteness. However, for this 
system the cc-pCVDZ-F12 basis, despite the addition 
of the explicitly correlated correction, is still not suffi¬ 
cient to obtain results compatible with the most accu¬ 
rate estimate s"'®’^-' of the well depth, which are around 
930cm“^. 

Interestingly, the [ 2 ]ri 2 correction for this system is 
not as significant as for many previously-studied sys¬ 
tems. It has been demonstrated that results of aug- 
cc-pVQZ quality can sometimes be obtained within an 
aug-cc-pVDZ basis^i. The small improvement here is 
perhaps explained by the F12 corrections being less 
effective at describing bonding via dispersion interac¬ 
tions. For the Be atom, an accurate variational en¬ 
ergy is —14.66736By including both [ 2 ]_ri 2 and 
[2]s corrections, the cc-pCVDZ-F12 Be energy goes from 
— 14.6574Eft to —14.6691i?;j (with 7 = 2AAaQ^). Thus, 
the accuracy of the atom energy is greatly improved, al¬ 
though interestingly it appears that the energy is below 
the complete basis set limit by ~ 2mEh (possible due 



Bond length/ao 

FIG. 6 . The [ 2 ]i{i 2 energies (relative to the FCI value at 
infinite separation) for Be 2 in a cc-pCVDZ-F12 basis set, with 
all electrons correlated and 7 = 2.44a^'^. Error bars, which 
are each calculated from 5 independent estimates, are plotted 
but not visible. All error estimates are less than 0.12cm“'^ 
{0.5iJ,Eh)- It is seen that the result at large bond lengths 
approaches the FCI result with very great accuracy. 



Bond length/ao 

FIG. 7. Be 2 binding curve in a cc-pCVDZ-F12 basis set, cal¬ 
culated by combining Ardm and [ 2 ]i{i 2 contributions. All 8 
electrons are correlated and 7 = 2.440,/^. The addition of 
the [ 2 ]_ri 2 basis set incompleteness correction improves the 
energy estimate. However, the system is still under bound 
by around 450cm“^ compared to experimental values, sug¬ 
gesting that larger basis sets (with high angular momentum 
functions) are still required. The relatively small improve¬ 
ment of the [ 2 ]i{i 2 correction for this system and basis set is 
not related to the use of FCIQMC. Error bars, which are each 
calculated from 5 independent estimates, are plotted but not 
visible. All errors estimates are less than 1 . 8 cm“^ {8fiEh)- 


to this perturbative framework). The improvement to 
the Be 2 binding energy is less significant. We emphasize 
however, that relatively small improvement of [ 2 ]ri 2 in 
this case is not a result of using FCIQMC. In particu¬ 
lar, the semi-stochastic adaptation is found to perform 
extremely well, with all stochastic errors being less than 
1 . 8 cm“^ {8^Eh). 
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VI. CONCLUSION 

We have performed a detailed study of the semi¬ 
stochastic adaptation to FCIQMC and presented a new 
method for generating the deterministic space. This ap¬ 
proach creates the space naturally from the dominant 
states in the FCIQMC wave function and avoids hav¬ 
ing to perform multiple large and time-consuming de¬ 
terministic ground-state calculations. Using this ap¬ 
proach greatly improves the stochastic efficiency of calcu¬ 
lations for a large range of systems and parameters. We 
also demonstrated that a simpler approach to the initia¬ 
tor approximation in semi-stochastic FCIQMC gives the 
same benefits as the approach previously suggested by 
Petruzielo et al^. We have also explained how the semi¬ 
stochastic adaptation can be implemented with relative 
ease in an existing parallel FCIQMC code. It has been 
shown that the parallel scaling is not significantly wors¬ 
ened for a range of numbers of CPU cores. Rather, for 
systems where parallel efficiency is particularly poor, it 
has been shown that deterministic approaches can signif¬ 
icantly improve the parallel performance and speed up 
FCIQMC calculations. This is a significant result and 
deserves further investigation. 

The benefit of the semi-stochastic approach was 
demonstrated for FCIQMC estimates of the 2-RDM. This 
is significant as we expect this object to be important in 
future applications of FCIQMC, as it can be used for 
calculating non-trivial two body operators, variational 
energy estimates and F12 basis set incompleteness cor¬ 
rections. We observed improvements in efficiency to fac¬ 
tors in excess of 1 million. As a concrete demonstration, 
these improvements were applied to the calculation of 
F12 corrections for Be 2 . We therefore suggest that using 
the semi-stochastic adaptation should become standard 
practice when performing FCIQMC calculations. 
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